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SUMMARY 

Hematopoietic stem cells (HSCs) maintain blood homeostasis and are the functional units of bone marrov\^ transplantation. To improve 
the molecular understanding of HSCs and their proximal progenitors, we performed transcriptome analysis v^^ithin the context of the 
ImmGen Consortium data set. Gene sets that define steady-state and mobilized HSCs, as well as hematopoietic stem and progenitor cells 
(HSPCs), were determined. Genes involved in transcriptional regulation, including a group of putative transcriptional repressors, v^^ere 
identified in multipotent progenitors and HSCs. Proximal promoter analyses combined v^^ith ImmGen module analysis identified candi- 
date regulators of HSCs. Enforced expression of one predicted regulator, Hlf, in diverse HSPC subsets led to extensive self-renewal activity 
ex vivo. These analyses reveal unique insights into the mechanisms that control the core properties of HSPCs. 



INTRODUCTION 

Hematopoietic stem cells (HSCs) reside at the apex of the 
hematopoietic hierarchy and generate the entire reper- 
toire of highly specialized hematopoietic effector cells by 
differentiating through a succession of increasingly 
committed progenitors. HSCs are the only hematopoietic 
cell type that can differentiate into all blood lineages and 
self-renev\^ for life. These properties, along v\^ith HSCs' 
remarkable ability to engraft conditioned recipients 
upon intravenous transplantation, have established the 
clinical paradigm for the application of stem cells in 
regenerative medicine. Indeed, HSC transplantation is 
routinely used to treat a variety of hematological condi- 
tions, including leukemia, multiple myeloma, severe com- 
bined immunodeficiency, and myelodysplastic syndrome. 
Nonetheless, HSC transplantation remains a relative high- 
risk procedure, v\^ith the most significant factor contrib- 
uting to the success of the procedure being the size of 
the transplanted graft (Siena et al., 2000). Enormous 
efforts have therefore been mounted to develop methods 
for expanding HSCs ex vivo, although these efforts have 
not yet translated to the clinic. A greater understanding 
of the molecular mechanisms underlying HSC fate and 



function will undoubtedly inform strategies for the thera- 
peutic manipulation of these cells, and may also improve 
our understanding of hematopoietic malignancies derived 
from Stem cells. 

The ability to purify HSCs to near homogeneity opens 
the door for their precise molecular characterization by 
microarray analysis. This approach is particularly useful 
for Studying HSCs because it allows for the simulta- 
neous, quantitative detection of entire transcriptomes 
from these rare cells. Although prior microarray studies 
have provided useful insights into HSC biology (Cham- 
bers et al., 2007; Forsberg et al., 2005, 2010; Rossi 
et al., 2005), it has proved challenging to cross-analyze 
data due to differences in experimental designs and 
technical methodologies. The ImmGen Project over- 
comes many of these limitations by generating transcrip- 
tome data from stem cells, defined progenitors, and 
various effector cells, using unified protocols of cell sort- 
ing, RNA extraction, unamplified sample preparation, 
and a common facility for microarray processing (Heng 
and Painter, 2008; Painter et al., 2011). Additional 
advantages of the ImmGen approach include a wider 
breadth of assayed hematopoietic cell types and states 
(~250), increased statistical power through array number 
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Figure 1. Population Distances Define HSPCs in Transcriptional Space 

(A) Population -distance analysis of microarray data presented in three principal components (PCs 1-3). Each point represents a single 
array. Cell types are color-coded. B, B cells; DC, dendritic cells; GN, granulocytes; MF, macrophages; Mo, monocytes; NK, NK cells; NKT, NKT 
cells; preT, T cell precursors; proB, B cell precursors; T, T cells; Tgd, y6 T cells. 

(B) Population-distance analysis of HSPC subsets including HSCs, MPPs (MPPl and MPP2), and oligopotent progenitors (CLP, CMP, MEP, 
and GMP). 

See also Figure SI. 



(~700 total), and utilization of the Affymetrix GeneChip 
Mouse Gene ST 1.0 microarray platform, which includes 
probes for >24,500 coding and > 1,300 noncoding 
transcripts. 

Here, we used the breadth of the ImmGen data set to 
delineate genes and regulators of the primitive hematopoi- 
etic cells, bringing to light conceptual advances at three 
levels of resolution: (1) hematopoietic stem and progenitor 
cells (HSPCs), (2) multipotent stem and progenitor cells, 
and (3) HSCs. All HSPCs showed enriched expression 
of metabolic growth- and proliferation-associated genes, 
which paradoxically were also expressed in quiescent 
HSCs. Genes encoding transcription factors, including a 
group of Kruppel-associated box (KRAB) domain-contain- 
ing CH3 zinc-finger proteins that are predicted to function 
as transcriptional repressors, were enriched in multipotent 
progenitors (MPPs) and HSCs. Exposure to clinically rele- 
vant mobilizing stimuli led to alterations in the expression 
of HSPC regulators, as well as membrane and extracellular 
matrix proteins and proteases. Proximal promoter analysis 
of genes identified in steady-state HSPCs and mobilized 
HSPCs (moHSPCs) revealed enrichment of motifs repre- 
senting putative binding sites for both known and un- 
known stem cell regulators, and ImmGen module analysis 
of HSC-enriched genes independently identified potential 
regulators. Enforced expression of one putative regulator, 
Hlf, resulted in robust induction of a primitive immunophe- 
notype, sustained colony-formation activity, and enhanced 
self-renewal in a number of progenitor subsets ex vivo. 



RESULTS 

Comparative Transcriptional Distances between 
Primitive HSPCs 

The generation of effector blood cells from HSCs proceeds 
through a series of downstream progenitors with increas- 
ingly restricted potential (Bryder et al., 2006). The most 
proximal progenitors to HSCs are MPPs, which retain full 
lineage potential but lack long-term self-renewal potential. 
As MPPs differentiate, they give rise to oligopotent progen- 
itors of either lymphoid or myeloid effector cells. To 
generate transcriptome data from primitive subsets, we 
sorted HSCs, MPPs (MPPl and MPP2), and oligopotent 
progenitors (common myeloid progenitor [CMP], granulo- 
cyte-macrophage progenitor [GMP], megakaryocyte- 
erythroid progenitor [MEP], and common lymphoid pro- 
genitor [CLP]) to a high degree of purity (for sorting details, 
see Table SI available online and http://www.immgen.org/ 
index_content.html) and subjected them to ImmGen 
expression profiling and quality-control pipelines (Heng 
and Painter, 2008). Hereafter, we refer collectively to these 
primitive subsets as HSPCs. Principal component analysis 
(PCA) was performed on the 20% most variable genes be- 
tween HSPCs and their downstream progeny (Figure lA). 
Strikingly, all HSPC subsets clustered closely together in 
relation to their downstream progeny, indicating that 
hematopoietic progenitors as functionally diverse as 
HSCs, CLPs, and CMPs share gene expression properties 
that commonly define them in transcriptional space. 
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Figure 2. HSPCs Are Enriched for Gene Sets that Enable Transit Amplification 

(A) Reduced representation of hematopoiesis showing normaLized and averaged values of 1,605 HSPC-enriched genes. 

(B) DAVID analysis showing enriched categories, with adjusted p value (Benjamini). 

(legend continued on next page) 
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We next interrogated the transcriptional relationships 
among hematopoietic progenitors by performing PCA 
analysis using the 20% most variable genes between the 
HSPC subsets to define transcriptional distances. In agree- 
ment with established functional relationships (Bryder 
et al., 2006), MPPl were positioned most proximal to 
HSCs, followed by MPP2, whereas oligopotent lymphoid 
and myeloid progenitors radiated farther along the prin- 
cipal components (Figure IB). 

HSPCs Are Transcriptionally Enriched for Genes 
Associated with Transit Amplification 

Though HSPCs represent a group of progenitors with diver- 
gent functional attributes, the relatedness of their tran- 
scriptomes (Figure lA) prompted us to determine whether 
we could identify a set of genes commonly expressed across 
diverse HSPC subsets. We therefore analyzed the combined 
HSPC subsets in comparison with their downstream 
hematopoietic progeny by one-way ANOVA (false discov- 
ery rate [FDR] < 5%, p < 1 x 10"^) and identified 1,605 
genes with enriched expression in HSPCs (Figure SIA; 
Table S2). A reduced representation of relative expression, 
averaged from all 1,605 genes, showed high expression 
in HSCs, MPPs, and oligopotent myeloid progenitors, 
and a lower level of induction within CLPs (Figure 2A). 
We next tested for functional enrichment in the Database 
for Annotation, Visualization and Integrated Discovery 
(DAVID) bioinformatics resource (http://david.abcc.ncifcrf . 
gov/), which revealed significant enrichment for genes 
associated with metabolic growth (ncRNA-metabolic, 
tRNA-metabolic, and Ribosomal subunits) and prolifera- 
tion (cell cycle, DNA-metabolic, and M-phase; Fisher's 
exact test, FDR < 4 x 10~^; Figure 2B), consistent with 
the high cycling activity and transit amplification potential 
of these progenitors. 

To visualize the relative expression of genes identified by 
ANOVA across the ImmGen data set, we normalized the 
expression values of the genes in either the ncRNA meta- 
bolic or cell cycle groups and plotted the average expres- 
sion for each cell type (Figure 2C). This analysis showed 
that in addition to HSPCs, these gene sets are also highly 
expressed in early B and T cell progenitors (Figure 2C; Fig- 
ures SIB and SIC), in line with the proliferative potential 
of these precursors (Carpenter and Bosselut, 2010). In 
contrast, effector cells such as granulocytes, dendritic cells, 
and natural killer (NK) cells showed markedly lower expres- 
sion, consistent with their terminally differentiated state. 



Interestingly, HSCs showed relatively high expression of 
metabolic growth and proliferation gene sets (Figure 2C), 
despite the fact that they are largely quiescent in adults 
(Bowie et al, 2006; Rossi et al., 2007; Wilson et al., 2008). 
To further explore this apparent paradox, we plotted the 
relative expression of genes in the ncRNA-metabolic, cell 
cycle, and DNA-metabolic categories in a limited subset 
of cell types, reasoning that this might allow us to discrim- 
inate between genes that encompass both positive and 
negative regulators of cell proliferation (found in the cell 
cycle gene sets) and genes that are more tightly linked to 
DNA synthesis (found in the DNA metabolic gene sets; Fig- 
ure 2D). Surprisingly, although the HSCs showed a slight 
relative decrease in expression of genes associated with 
DNA metabolism in comparison with other HSPC subsets, 
they nonetheless exhibited relatively high expression of 
genes in these categories. These data raise the possibility 
that even though they reside predominantly in the quies- 
cent GO phase of the cell cycle, HSCs are nonetheless tran- 
scriptionally poised to enter the cell cycle by expression of 
genes that mediate cell-cycle progression. This postulate 
implies that active maintenance of quiescence is a requisite 
feature of adult HSCs, a notion that has been borne out in 
studies that have defined regulators that hold HSCs in a 
quiescent state. To explore this concept further, we exam- 
ined the expression of a subset of positive and negative reg- 
ulators of the cell cycle (Figure 2E). Interestingly, whereas 
HSCs clearly showed robust expression levels of cell-cycle 
drivers such as Cdk2, Cdk4, and Cdk6, the only canonical 
Cdk inhibitor with high expression in HSCs was Cdknlc, 
which encodes p5 7, a protein that was recently shown to 
regulate HSC quiescence (Matsumoto et al., 2011; Zou 
et al., 2011). The Rb family members also showed expres- 
sion (albeit nonpreferential) in HSCs (Figure 2E), in 
agreement with their combined role in regulating HSC 
quiescence (Viatour et al., 2008). 

Cumulatively, these results demonstrate that HSPCs 
exhibit elevated expression of genes consistent with their 
high cycling activity, and suggest that HSC quiescence is 
a poised state in which genes and pathways required for 
cell-cycle entry and growth are expressed. 

Identification of a Group of CH3 Zinc-Finger KRAB 
Domain-Containing Transcriptional Repressors in 
Multipotent Stem and Progenitor Cells 

To identify genes and pathways enriched in hematopoietic 
multipotency, we analyzed multipotent stem/progenitors 



(C) NormaLized and averaged values for the indicated categories across the ImmGen data set. CeLL types were grouped as indicated. 

(D) NormaLized and averaged values for the indicated categories in ImmGen data sets. 

(E) Heatmap of positive and negative regulators of ceLL cycle in the indicated ceLL types. 
See also Figure 52. 
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Figure 3. Hematopoietic Multipotent Stem and Progenitor Cells Express a Family of KRAB Domain-Containing Zinc-Finger Tran- 
scriptional Repressors 

(A) Reduced representation of hematopoiesis showing normalized and averaged values of 433 MPP-enriched genes. 

(B) DAVID analysis showing enriched categories, with adjusted p value (Benjamini). 

(C) Graphs showing the linear values (averaged array replicates ± SEM) of the indicated genes along differentiation trajectories from HSC 
to MEP (green), GMP (red), PreB (purple), and PreT (blue). Biological replicates: n = 2 (MPPl, MPP2, and MEP), n = 3 (HSC, CMP, GMP, PreB, 
and PreT), and n = 4 (CLP). 

See also Figure S3. 



(HSCs, MPPl, and MPP2) as a group in comparison with 
their downstream progeny, and identified 443 genes 
with enriched expression (one-way ANOVA, FDR < 5%, 
p < 1 X 10~^; Table S3; Figure S2). Reduced representation 
of expression showed the highest relative expression in 
HSCs, followed by MPPls and MPP2s (Figure 3A). DAVID 
analysis revealed a significant overrepresentation of 
genes encoding KRAB domain-containing proteins and 
C2H2 zinc-finger domain-containing proteins (Figure 3B). 
When present in proteins that also contain DNA-binding 
domains, KRAB domains canonically function to recruit 
transcriptional repressors (Urrutia, 2003), and since all 
of the KRAB domain-containing proteins we identified 
also contain C2H2 zinc-finger DNA-binding domains, it 



is predicted that these proteins function as transcriptional 
repressors. To visualize their expression at increased 
resolution, we focused our analysis on how their expres- 
sion levels change during differentiation between stem 
cell and defined downstream progenitor cell populations 
(Figure 3C; Figure S3). The preferential expression of 
these putative transcriptional repressors in primitive 
progenitors that possess multilineage differentiation 
capacity raises the possibility that they may be involved 
in maintaining hematopoietic multipotency through 
KRAB-mediated suppression of lineage commitment 
pathways in a general (e.g., Zfp826 and Zfpl2) or line- 
age-specific (e.g., Gml4420, A630089N07RiK Zkscanl, 
and Zfp266) manner. 
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Transcriptional Regulation of HSCs 

We next sought to identify genes and pathways that 
might uniquely regulate HSCs within the hematopoietic 
system. To achieve this, we compared the transcriptome 
of HSCs with all other hematopoietic cell types in the 
ImmGen data set and identified 322 genes with enriched 
expression in HSCs (one-way ANOVA, FDR < 5%; Figures 
4 A and 4B; Table S4). Functional annotation by DAVID 
showed that most genes could be grouped into a limited 
number of categories, whereas 43% (138/322) of the iden- 
tified HSC genes remained uncharacterized in any cell 
type (Figure 4B). The 322 HSC-enriched genes were signif- 
icantly enriched for KRAB domain-containing proteins 
and C2H2 zinc-finger transcription factors, as observed 
in the broader multipotent stem and progenitor cell 
analysis. In total, 51 of the 322 HSC-enriched genes 
were identified as transcription regulators (Figure 4C) 
whose HSC-enriched expression proved to be conserved 
between mouse and human (Figure 4D), and included 
known HSC regulators such as Meisl, Mecom/Evi-1, Ndn, 
MycN, and HoxA9 (Figures 4C and 4D). To explore the 
interrelationships of these factors, we constructed a 
functional gene network using a context likelihood of 
relatedness (CLR)-based method (Faith et al., 2007) and 
the entire ImmGen data set to derive connections be- 
tween genes in this network representing nonrandom 
and statistically significant dependencies. Strikingly, of 
the 51 HSC-enriched transcription factors we identified, 
48 segregated into two distinct clusters (Figure 4E). Inter- 
estingly, all factors that were previously reported to oper- 
ate functionally in HSCs fell into one network cluster, 
suggesting that these genes may be under a common reg- 
ulatory architecture (Figure 4E). 

To clarify regulators of HSC-specific gene expression, we 
next used de novo motif discovery (MEME) (Machanick 
and Bailey, 2011) to analyze the proximal promoters of 
the 322 HSC-enriched genes, defined as ±1,000 bp from 
the transcription start sites (TSSs). We identified four 
motifs, which TOMTOM analysis recognized as putative 
binding sites of a number of transcription factors (Fig- 
ure 4F). The most significant motif is a putative binding 
site of EGRl, which was previously demonstrated to regu- 
late HSC quiescence and retention in bone marrow (BM) 
(Min et al., 2008). The second motif is a predicted binding 
site for SOX4, which is reported to enhance murine HSC 
reconstitution potential (Deneault et al., 2009). The third 
motif is a predicted binding site for aryl hydrocarbon recep- 
tor (AHR), which is striking in light of a recent report 
demonstrating ex vivo expansion of HSCs using a purine 
derivative that acts as an AHR agonist (Boitano et al., 
2010). The fourth motif is predicted to bind STATl, which 
is required for interferon-induced activation of HSCs (Ess- 
ers et al, 2009). 



To further explore the potential regulatory network of 
HSCs, we utilized module analysis (http://www.immgen. 
org/ModsRegs/modules.html), which identifies putative 
transcriptional regulators based on coexpression across 
the ImmGen data sets. This analysis was undertaken with 
the broader ImmGen data set that also includes nonhema- 
topoietic cell types (e.g., stromal and endothelial cells). 
Four modules were significantly enriched for the HSC- 
induced genes (hypergeometric, p < 0.001; Figure 5 A), 
and each showed a pattern of high expression in stem cells 
and downregulation upon hematopoietic differentiation. 
Interestingly, the most enriched module (#40) also showed 
relatively high expression of a subset of HSC genes in endo- 
thelial cells (Figure 5B; Figure S4A). This unexpected 
finding may reflect the developmental origin of HSCs, 
which are derived from a population of fetal hemogenic 
endothelial cells (Dzierzak and Speck, 2008). The module 
analysis also predicted 32 regulators for the four HSC-en- 
riched modules (Figure 5C; Figure S4B) and included 
STATl and SOX4, which we had identified based on en- 
riched sequence motifs (Figure 4F). Some of the predicted 
regulators (e.g., HoxA9 and Mecom) showed restricted 
expression to the primitive hematopoietic compartment, 
whereas others showed broader expression. The latter 
group included established HSC regulators, such as Gata2, 
MycN, and Erg, that showed high expression not only in 
HSCs but also in endothelial cells (Figure 5C), consistent 
with their established functional roles in both cell types 
(Gottgens et al, 2002; Linnemann et al., 2011; Ng et al., 
2011; Sato, 2001). Interestingly, the four enriched tran- 
scription factor binding motifs we identified in HSCs (Fig- 
ure 4F) are predicted to bind factors that are expressed in 
both HSCs and endothelial cells {Egrl, Sox4, Ahr, and 
Statl), suggesting a shared regulatory program. 

G-CSF Mobilization Induces Common Transcriptional 
Changes in HSCs and MPPs 

In adult mice and humans, a small percentage of HSCs and 
progenitor cells migrate periodically from the BM niche 
into the circulation (Massberg et al., 2007; Min et al., 
2008; Wright et al., 2001b). The frequency of HSCs in 
the circulation increases significantly in response to 
inflammation and following administration of mobilizing 
agents. In particular, treatment of mice or humans 
with a combination of cyclophosphamide/granulocyte col- 
ony-stimulating factor (G-CSF; Cy/G) drives rapid prolifer- 
ation, expansion, and migration of HSPCs from the BM to 
peripheral hematopoietic compartments (Morrison et al., 
1997; Neben et al, 1993; Passegue et al, 2005), and mobi- 
lization is routinely used in clinical practice to collect cells 
for transplantation. However, the molecular regulators that 
control HSC expansion and migration during this process 
remain elusive. 
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Figure 4. Identification of HSC-Specific Transcriptional Regulators 

(A) Reduced representation of hematopoiesis showing normaLized and averaged values of 322 HSC-enriched genes. 

(B) Heatmap of aLL HSC-enriched genes across hematopoiesis. Functional classification as determined by DAVID. 

(C) Expression of transcriptional regulators enriched (>4-fold) in murine HSCs presented as a ratio of mean expression in HSCs over the 
mean expression in all other ImmGen cell types. 

(D) Expression of the orthologs in (C) in human HSCs (Novershtern et al., 2011). 

(legend continued on next page) 
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Figure 5. ImmGen Module Analysis Identifies Putative Regulators of HSCs 

(A) Graph showing moduLes (identifier numbers) significantly enriched with HSC-specific genes. Number of common genes and hyper- 
geometric p values are indicated. 

(B) Heatmap showing the averaged normalized expression of HSC genes in module #40. 

(C) Absolute expression of HSC regulators predicted by ImmGen module analysis. Log2 values are shown. 
See also Figure S4. 



HSC^^^"" and MPP^^^™ subsets were harvested from Cy/G- 
treated mice (referred to hereafter as moHSC^^^™ and 
moMPP^^^™, respectively; Figure 6A), and RNA harvested 
from these cells was compared with RNA extracted from 
steady-state HSC^^^"" and MPP^^^"" (Table SI). Notably, the 
cell purification strategy used for these mobilization ana- 
lyses was different from the one used in the previous ana- 
lyses (Figures 1, 2, 3, 4, and 5), due to the availability of ex- 



isting functional data that validated these marker sets for 
isolation of the relevant cell populations from mobilized 
mice. These samples were also processed with an amplifica- 
tion step and therefore were analyzed separately from the 
broad ImmGen data set. Importantly, despite the differ- 
ences in immunophenotype, multiparameter fluores- 
cence-activated cell sorting analyses and mean class expres- 
sion analyses revealed that SLAM-code (Kiel et al., 2005) 



(E) Connectivity map based on correlated expression showing the 51 identified HSC-enriched transcriptional regulators, with known 
regulators of HSCs highlighted in orange. TFl = 2810021G02Rik, TF2 = 2610008EllRik, TF3 = A630033E08Rik, and TF4 = 10305D13Rik. 

(F) Significantly enriched sequence motifs ± 1,000 bp of TSS in HSC-enriched genes, showing enrichment values (E values) and predicted 
binding factors. 
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Figure 6. MoHSPCs Express a Defined Gene Signature 

(A) Sciiennatic of the Cy/G treatment used to mobUize HSPCs. Mice were injected with a single dose of cyclophosphamide (Cy; 4 mg/mouse, 
i.p.), followed by two daily G-CSF (G; 5 |ig/mouse) injections (D2 Cy/G treatment). HSC^^^"' and MPP^^^"' were sorted from untreated and D2 
Cy/G-treated mice for RNA extraction and microarray hybridization. 

(B) Multiplot analysis to identify differentially expressed genes between each comparison (Hochberg test; FDR < 10%, fold change > 1.5). 

(C) Heatmap of diiferentially expressed genes in moHSPC versus steady-state HSPC (FDR < 10%). 

(D) Statistically significant transcription factor binding motifs (TFBs; in the upstream regulatory region and TSS [±1,000 bp]) of 
differentially expressed genes. The putative TF family binding motif and the p value before the null model correction are noted. 

(E) Table of the known upstream regulators of genes in the data set identified by the Ingenuity knowledge base (p < 0.05, right-tailed 
FisheKs exact test). 

See also Figures S5 and S6. 
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HSCs (LSKCD48-CD150"') showed significant overlap with 
HSCs defined as LSKFlk2-CD34- (Figure S5A), and expres- 
sion profiling revealed that the vast majority of genes are 
similarly expressed in HSCs purified by either strategy 
(Pearson correlation = 0.997; Figure S5B), with only 24 
probe sets exhibiting significantly differential expression 
(FDR < 10%, fold change > 2; Figure S5C). Moreover, PCA 
of the 20% most variable genes across these populations 
showed that HSC^^^"" and MPP^^^"" positioned closely to 
the LKSCD34-Flk2- HSCs and MPPls, respectively (Fig- 
ure S5D), consistent with the previously ascribed immuno- 
phenotypic and functional overlap of these populations 
(Bryder et al, 2006). 

Analysis of moHSPCs was performed at day 2 of the 
mobilization protocol, the peak of Cy/G-induced HSC 
expansion (Wright et al., 2001a, 2001b), when animals 
typically show a 3- to 5-fold increase in HSPC number 
(Forsberg et al., 2010; Passegue et al., 2005). We first consid- 
ered in aggregate the expression patterns of steady-state 
HSPCs and moHSPCs. This analysis revealed 15 genes 
exhibiting differential expression (FDR < 10%) (Figure 6B 
and 6C; Table S5), and of note was the upregulation in 
extracellular and transmembrane proteases, including 
PrtnS, which encodes a leukocyte serine protease (Protein- 
ase 3) that degrades elastin, fibronectin, laminin, vitronec- 
tin, and collagen IV, and has been suggested to act as a 
''path clearer" for neutrophil migration (Kuckleburg et al., 
2012). Although previous studies have implicated acti- 
vated immune cells as the primary effectors of proteolysis 
during HSPC mobilization, the upregulation of matrix pro- 
teases in moHSPCs suggests that autocrine proteolysis may 
also be important. To identify candidate transcriptional 
regulators of the moHSPC genes, we performed MEME 
analysis of proximal promoters (±1,000 bp of the TSS) of 
the 15 moHSPC genes, which revealed two significantly 
enriched motifs (Figure 6D). Ingenuity Pathway Analysis 
(IPA) was used to identify an additional set of factors with 
known binding sites (right-tailed Fisher's exact test, p < 
0.05; Figure 6E). To focus more specifically on the func- 
tional effectors of BM transplant, we next examined our 
data to identify genes that distinguish moHSC^^^™ and 
moMPP^^^™ from their steady-state equivalents. Pairwise 
analysis revealed 42 genes differentially expressed between 
moHSC^^^"" and HSC^^^"" (fold change > 1.5; FDR < 10%; 
Figure S6A; Table S6). This gene set was enriched for a num- 
ber of functional categories, including apoptosis and cell 
adhesion, exocytosis and actin cytoskeleton organization, 
and cell motility (Figure S6C). Proximal promoter analysis 
of the 42 moHSC^^^™ genes identified three enriched 
sequence motifs and corresponding regulators (Figure S6B). 
In moMPP^^^™, 182 genes were differentially expressed 
(fold change > 1.5; FDR < 10%; Figure S6D; Table S7). IPA 
revealed enrichment of a number of categories, including 



cell cycle and cancer and cell movement and immune 
cell trafficking, among others (Figure S6F). 

Altogether, the genes identified through this analysis 
define a molecular signature associated with HSPC pro- 
liferation and mobilization. Importantly, HSC^^^™ and 
;^ppsiam (jispi^y remarkably similar transcription profiles 
during mobilization, despite inherent differences in 
their self-renewal potential, thereby suggesting common 
targets in stem and progenitor cells whose manipula- 
tion can lead to perturbed proliferation, adhesion, and 
migration. 

Hlf Is a Positive Regulator of Multilineage Potential 
and Self-Renewal In Vitro 

A central goal in our analysis of HSC-specific expression 
patterns was to identify key regulators that modulate HSC 
fate and function. We chose Hlf for functional validation 
because it is one of the most strikingly HSC-specific genes 
(Figures 4B-4D) and was predicted by module analysis to 
be an HSC regulator (Figure 5C). Hlf encodes a PAR-bZIP 
transcription factor that is studied principally in the 
context of acute leukemia involving the t(l 7; 19) transloca- 
tion that generates the oncogenic E2A-HLF fusion protein 
(Hunger et al., 1992; Inaba et al., 1992). Ectopic expression 
of HLF was reported to enhance the short-term xenograft 
potential of human lineage-negative cord blood cells, sug- 
gesting an important role in HSPC biology (Shojaei et al., 
2005). We therefore constructed doxycycline-inducible 
Hlf and control Antiviruses containing an IRES-ZsGreen 
reporter cassette, and transduced HSCs, MPPls, MPP2s, 
CMPs, GMPs and MEPs purified from mice expressing 
the reverse tet-transactivator, rtTA, at the Rosa26 locus 
(Hochedlinger et al., 2005). Transduced cells were cultured 
and immunostained at weekly intervals for lineage markers 
and CD 150 (Slamfl) to monitor differentiation and eval- 
uate the presence of primitive hematopoietic progenitors. 
Enforced expression of Hlf in HSCs caused a significant 
percentage of cells to maintain a lin~CD150"^ immunophe- 
notype during 3 weeks of ex vivo culturing, whereas 
control-transduced HSCs quickly lost this primitive 
immunophenotype and became lin"^CD150~ (Figure 7A). 
Strikingly, Hlf wds also able to induce a lin~CD150"^ immu- 
nophenotype in a number of downstream progenitors that 
were initially sorted as CD150~, which was maintained 
over several weeks of culturing (Figure 7A). After 30 days 
of culture in the presence of doxycycline, the Hlf-tmns- 
duced cultures contained multiple myeloid cell types, 
including megakaryocytes, macrophages, granulocytes, 
and undifferentiated cells, whereas the control cultures 
contained only macrophages (Figure 7B). In an indepen- 
dent experiment, ectopic expression of Hlf or HoxB4 in 
HSCs maintained mixed myeloid colony-forming potential 
after long-term (45 days) ex vivo culturing. In contrast. 
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Figure 7. HLF Is a Positive Regulator of Multipotency and Self-Renewal In Vitro 

(A) Representative flow-cytometry plots showing CDllb and CD150 staining (Left) and the time course (right) of the indicated HSPC 
subsets transduced with control (ZsGreen) or HLF Lentiviruses. Plots (Left) were generated 2 weeks posttransduction in Liquid cuLture. CeLLs 
were pregated on Lineage markers (CDS, B220, Terll9, and Grl). Representative experiment with three bioLogicaL repLicates (±SEM). *p < 
0.05. 

(B) Cytospin showing representative ceLL types generated by HSCs transduced with controL or HLF-expressing Lentiviruses and maintained 
in Liquid cuLture for 30 days. 

(C) CoLony number and composition from HSCs transduced with controL, HoxB4, or HLF-expressing Lentiviruses and cuLtured for 45 days 
prior to pLating. Three bioLogicaL repLicates per sampLe (+SEM). 

(D) CoLony number and composition upon seriaL pLating in methyLceLLuLose of the indicated stem and progenitor ceLLs transduced with 
controL or HLF-expressing Lentiviruses. Three bioLogicaL repLicates per sampLe (±SEM). 

See aLso Figure S7. 



untransduced or control transduced HSCs showed limited of H/f leads to the maintenance of mixed myeloid lineage 
colony-forming potential, and an inability to maintain of potential within HSC cultures even after prolonged 
mixed myeloid lineage potential. Thus, ectopic expression ex vivo culturing. 
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To further examine the functional potential of Hlf, 
we sorted and transduced HSCs, MPPs, and cKit~ 
Seal "^lineage" myeloid progenitors (MyPros) with Hlf or 
control virus, and assayed for colony-forming cell (CFC) ac- 
tivity in methylcellulose-based serial plating experiments. 
Both control and ^//-transduced cells produced colonies 
in the primary plating, although ^//-transduced MPPs 
and MyPros generated significantly more colonies (Fig- 
ure 7D). Secondary and tertiary plating revealed that only 
/^//-transduced cells continued to robustly generate col- 
onies, whereas control-transduced cells lost activity, as ex- 
pected. Importantly, quantification of colony types further 
revealed that ff/f expression conferred sustained multiline- 
age potential, as evidenced by the presence of CFU-GEMM 
colonies at each plating (Figure 7D). Withdrawal of doxycy- 
cline led to loss of CFC activity, indicating that continued 
7f/f expression is necessary to sustain replating potential 
(Figure S7). Taken together, these experiments demonstrate 
that Hlf can impart potent, sustained self-renewal activity 
on HSCs and downstream progenitors during ex vivo 
manipulation. 



DISCUSSION 

HSPCs include rapidly cycling progenitor cells that pro- 
duce vast numbers of effector cells on a daily basis. It 
was therefore not unexpected to find that genes involved 
in cell cycle and metabolic growth were enriched in 
HSPCs, but surprisingly, we also discovered that many of 
these genes are highly expressed in quiescent HSCs. 
Undoubtedly, this result is influenced in part by the fact 
that certain aspects of cell-cycle regulation occur posttran- 
scriptionally. Although it is possible that the small percent- 
age (~5%) of cycling HSCs (Passegue et al., 2005; Rossi 
et al, 2007; Wilson et al., 2008; Yamazaki et al., 2006) 
might account for all or most of the transcripts associated 
with cell cycle and metabolic growth, this possibility is 
unlikely to explain the high expression levels we observed. 
Robust expression of cell-cycle progression and metabolic 
growth genes in HSCs is consistent with the idea that, 
despite quiescence, these cells are primed for rapid activa- 
tion, possibly as a mechanism to allow for rapid cell-cycle 
entry in response to acute injury or stress. Moreover, these 
data suggest that the balance between HSC dormancy and 
activation is regulated, at least in part, posttranscription- 
ally. In support of this, p57, the CDK inhibitor that is 
responsible for maintaining HSC quiescence (Matsumoto 
et al., 2011; Zou et al., 2011), has been shown to localize 
to the cytoplasm along with CyclinD2 in quiescent 
HSCs, and upon cytokine stimulation p57 is rapidly 
degraded concomitantly with translocation of CyclinD2 
to the nucleus and entry into the cell cycle (Passegue 



et al., 2005; Rossi et al, 2007; Wilson et al, 2008; Yamazaki 
et al., 2006). 

Delineating the transcriptional programs that underlie 
HSPC cell mobilization provides molecular insight into 
the regulation and function of cells whose robust activity 
is essential for the clinical success of hematopoietic cell 
transplantation. Retention of HSPCs within the stem cell 
''niche" is regulated in part by interactions between ligands 
expressed in the niche and receptors on the surfaces of 
HSPCs (such as SDF-1-CXCR4; TPO-MPL, and VLA-4- 
VCAM-1). G-CSF treatment is thought to attenuate these 
retention signals via stimulation of proteolytic enzymes 
to promote HSPC egress from BM into the circulation 
(Dar et al., 2006). Although activated myeloid cells are 
widely acknowledged as a primary source for such proteo- 
lytic enzymes, our analysis unexpectedly identified HSPC 
intrinsic upregulation of several genes encoding extracel- 
lular and transmembrane proteases, suggesting that HSPCs 
may produce autocrine signals that promote their migra- 
tion in response to mobilizing signals. Intriguingly, many 
of the enriched moHSPC biological functions and molecu- 
lar pathways mirror those used by immune and/or cancer 
cells for attachment, migration, and homing (Tables S5, 
S6, and S7). Further elucidation of these common path- 
ways and the many as yet uncharacterized genes will 
enhance our understanding of stem and progenitor cells 
during mobilization, and may potentially lead to increased 
clinical efficacy of stem-cell-targeted therapies for hemato- 
poietic malignancies. 

Although several genes have been identified that regulate 
HSC self-renewal and quiescence, candidate regulators of 
hematopoietic multipotency remain elusive. Therefore, 
our identification of a large family of mostly unstudied 
KRAB domain-containing zinc-finger transcriptional regu- 
lators whose expression is enriched within the multipotent 
HSC compartment is intriguing. CH3 zinc-finger proteins 
bind DNA with each finger interacting with three or four 
bases (Urrutia, 2003). Because the genes we identified in 
this family encode proteins that contain three to 23 zinc 
fingers (average 12), they likely bind with great specificity 
within the genome. Strikingly, because each of these fac- 
tors also contain KRAB domains, which canonically func- 
tion to recruit proteins to mediate transcriptional repres- 
sion, the proteins we have identified are likely to act as 
transcriptional repressors. Given their enriched expression 
in multipotent progenitors, we hypothesize that the collec- 
tive activity of these factors may be involved in maintain- 
ing hematopoietic multipotency, perhaps through active 
repression of lineage commitment and differentiation pro- 
grams. Interestingly, expression for most of these factors is 
not fully restricted to the multipotent stem (HSC) and pro- 
genitor (MPP) cell compartments, and instead is often 
maintained in one or more downstream lineages. Such 
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expression patterns would be consistent with the idea that 
individual factors function to repress commitment to 
defined lineages and therefore must be maintained during 
commitment to opposing lineages as a means of 
preventing aberrant activation of gene programs associated 
with other lineages. For example, expression of the 
KRAB-containing zinc-finger genes Gm 14420 and 
A630089N07Rik is maintained from HSCs and MPPs 
through to MEPs, but is significantly diminished in progen- 
itors of other lineages, including GMPs, pre-B cells, and pre- 
T cells, suggesting that these two proteins may be involved 
in repressing genes associated with non-MEP cell fates. The 
hypothesis that these KRAB domain-containing regulators 
play a role in maintaining multipotency in HSPCs through 
suppression of differentiation pathways in either a general 
or lineage-specific manner remains to be experimentally 
tested. 

The mechanisms that regulate the central properties of 
HSCs are not fully understood. Using the vast resource of 
ImmGen, we sought to identify genes with enriched 
expression in HSCs, reasoning that such genes might repre- 
sent key regulators of stem cell fate and function. In 
support of this, we readily identified several known HSC 
regulators, including HoxB4, Erg, HoxA9, Meisl, Egrl, and 
Mecom (Orkin and Zon, 2008), as well as genes that have 
not previously been implicated in HSC biology. Based on 
its HSC-specific expression and predicted regulatory role 
as determined by module analysis, we identified Hlf as a 
high-priority candidate for functional validation. We 
found that H/f endowed HSCs and downstream progenitors 
with enhanced self-renewal, and sustained long-term 
mixed myeloid lineage potential during ex vivo culturing. 
Interestingly, these results complement and extend a previ- 
ous report examining TiLF-expression in human HSPCs, in 
which ectopic expression of HLF led to an increase in the 
short-term xenograft potential of human lineage-negative 
cord blood cells containing HSCs and all of their down- 
stream progenitor progeny (Shojaei et al., 2005). Our 
finding that HLF can impart potent and sustained self- 
renewal activity to HSPCs ex vivo suggests that increased 
self-renewal of HSPCs may underlie the observations re- 
ported by Shojaei et al. (2005). 

The insights this study provides into the transcriptional 
regulation of HSCs, combined with the identification of 
HSC-specific transcription factors, could eventually lead 
to the development of combinatorial strategies aimed at 
inducing HSC potential in nonstem cells in a manner 
similar to that used for the reprogramming of other cell 
types (Graf and Enver, 2009). Moreover, our findings 
regarding the transcriptional programs that regulate the 
central properties of HSCs not only provide insights into 
the basic biology of these cells but may also illuminate 
innovative strategies to improve their clinical utility. 



EXPERIMENTAL PROCEDURES 

Sorting HSPCs 

Immunophenotypes of HSPC subsets are shown in Table SI. Cyto- 
kine-induced mobilization of HSPCs was performed as previously 
described (Passegue et al., 2005). Experimental cell-sorting and 
processing schemes are available at https://www.immgen.org/ 
index_content.html. 

Microarray and Informatic Analysis 

ImmGen VI samples were not amplified prior to microarray 
hybridization, except for those cells obtained in the mobilization 
studies presented herein, which were amplified (Genisphere) 
prior to hybridization. For this reason, these data sets were 
normalized and analyzed independently of the broader ImmGen 
data set. The numbers of microarrays utilized for stem and 
progenitor cell populations are as follows: HSC (3), MPPl (2), 
MPP2 (2), CMP (3), MEP (2), GMP (3), CLP (4), pre-B cells (3), 
and pre-T cells (3). In order to identify genes with enriched 
expression in different hematopoietic subsets, one-way ANOVA 
was implemented by the MATLAB function anoval. Secondary 
analysis was implemented by the MATLAB function multi- 
compare. Gene lists were subjected to standard enrichment 
analysis through DAVID (Huang et al., 2009). For the mobilized 
j^^^siam MPP^^^"" comparisons, biological functions and 
molecular pathway analysis association networks were generated 
by IPA software (v8.7; Ingenuity Systems). Significance in the 
data set analyzed by IPA was determined by a right-tailed Fisher's 
exact test (p < 0.05) using the whole IPA knowledge base as a 
reference set. To generate heatmaps. Get files of the selected genes 
were visualized through GenePattern. Module analysis was done 
as previously described Qojic et al., 2013). 

Gene Network Prediction 

A functional gene network was constructed using the full set of 
ImmGen microarrays (March 2010 release). The gene-by-microar- 
ray matrix of expression values was taken as input to the CLR 
algorithm (Faith et al., 2007). Briefly, CLR calculates a mutual in- 
formation matrix of all pairwise gene-by-gene expression profiles, 
where an expression profile is defined as the vector of log2-trans- 
formed expression values across all ImmGen cell populations. 
For each individual gene, the distribution of mutual information 
values is Z transformed to derive a normal distribution. Back- 
ground correction for each gene is applied using Stouffer's Z-score 
method to combine Z scores. An FDR is calculated for each of these 
values and an edge is drawn between two genes if the calculated 
FDR < 1 X 10~^. Cytoscape was used for network visualization 
(Shannon et al., 2003). 

Motif Analysis 

Proximal Promoter sequences (±1,000 bp of TSS) were retrieved 
from Ensemble BioMart (Kinsella et al., 2011) using the NCBI 
v37 mouse genome assembly. MEME-chromatin immunoprecipi- 
tation (MEME-ChIP) (Machanick and Bailey 2011) was used to 
identify enriched sequence motifs between 6 and 30 bp. 
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Functional Assays 

H/f (MGI:96108) was cloned into the pHAGE2 lentivirus (Mosto- 
slavsky et al., 2005) under a TRE promoter. Cells were double sorted 
for purity and transduced at 100 multiplicity of infection. For 
in vitro immunophenotype assays, cells were cultured in Dulbec- 
co's modified Eagle's medium/F-12 media supplemented with 
doxycycline (1 [ig/ml), L-glutamine, pen-strep, nonessential 
amino acids, beta-mercaptoethanol, 10% fetal bovine serum, and 
the cytokines thrombopoietin, stem cell factor, interleukin-3, 
and flt3L (each at 10 ng/ml). For CFC assays, cells were transduced, 
and cultured in liquid media for 2 days, and then transduced cells 
were sorted and plated in M3434 methylcellulose media (Stem Cell 
Technologies) at 250 cells per well. Colony number and type were 
quantified on day 9 or 10, followed by serial replating of 10,000 
cells per well. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes seven figures and seven tables 
and can be found with this article online at http://dx.doi.org/10. 
1016/j.stemcr.2013.07.004. 
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